AITopics | cross-modal retrieval

Collaborating Authors

cross-modal retrieval

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Easy Regional Contrastive Learning of Expressive Fashion Representations

Neural Information Processing SystemsMar-19-2026, 02:34:58 GMT

When learning vision-language models (VLM) for the fashion domain, most existing works design new architectures from vanilla BERT with additional objectives, or perform dense multi-task learning with fashion-specific tasks. Though progress has been made, their architecture or objectives are often intricate and the extendibility is limited.By contrast, with simple architecture (comprising only two unimodal encoders) and just the contrastive objective, popular pre-trained VL models (e.g., CLIP) achieve superior performance in general domains, which are further easily extended to downstream tasks.However, inheriting such benefits of CLIP in the fashion domain is non-trivial in the presence of the notable domain gap. Empirically, we find that directly finetuning on fashion data leads CLIP to frequently ignore minor yet important details such as logos and composition, which are critical in fashion tasks such as retrieval and captioning.In this work, to maintain CLIP's simple architecture and objective while explicitly attending to fashion details, we propose $E^2$: Easy Regional Contrastive Learning of Expressive Fashion Representations.$E^2$

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An End-To-End Graph Attention Network Hashing for Cross-Modal Retrieval

Neural Information Processing SystemsMar-17-2026, 20:28:21 GMT

Due to its low storage cost and fast search speed, cross-modal retrieval based on hashing has attracted widespread attention and is widely used in real-world applications of social media search. However, most existing hashing methods are often limited by uncomprehensive feature representations and semantic associations, which greatly restricts their performance and applicability in practical applications. To deal with this challenge, in this paper, we propose an end-to-end graph attention network hashing (EGATH) for cross-modal retrieval, which can not only capture direct semantic associations between images and texts but also match semantic content between different modalities. We adopt the contrastive language image pretraining (CLIP) combined with the Transformer to improve understanding and generalization ability in semantic consistency across different data modalities. The classifier based on graph attention network is applied to obtain predicted labels to enhance cross-modal feature representation. We construct hash codes using an optimization strategy and loss function to preserve the semantic information and compactness of the hash code. Comprehensive experiments on the NUS-WIDE, MIRFlickr25K, and MS-COCO benchmark datasets show that our EGATH significantly outperforms against several state-of-the-art methods.

artificial intelligence, name change, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

4d893f766ab60e5337659b9e71883af4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 14:35:41 GMT

artificial intelligence, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
(2 more...)

Add feedback

2492288f6878e6f99124b362604e58f5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:56:08 GMT

information, selection token, tag entity, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > Ontario > Toronto (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Neural Information Processing SystemsFeb-2-2026, 06:21:46 GMT

Binary source code matching, especially on function-level, has a critical role in the field of computer security. Given binary code only, finding the corresponding source code improves the accuracy and efficiency in reverse engineering. Given source code only, related binary code retrieval contributes to known vulnerabilities confirmation. However, due to the vast difference between source and binary code, few studies have investigated binary source code matching. Previously published studies focus on code literals extraction such as strings and integers, then utilize traditional matching algorithms such as the Hungarian algorithm for code matching.

artificial intelligence, cross-modal retrieval, machine learning, (10 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.59)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

An End-to-End Graph Attention Network Hashing for Cross-Modal Retrieval

Neural Information Processing SystemsDec-27-2025, 21:00:20 GMT

Due to its low storage cost and fast search speed, cross-modal retrieval based on hashing has attracted widespread attention and is widely used in real-world applications of social media search.

hash code, information, retrieval, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.94)
(6 more...)

Add feedback

Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval

Neural Information Processing SystemsDec-25-2025, 04:12:23 GMT

Vision and diverse languages are important information sources in our living world. A model that understands multi-modalities and multi-languages can be applied to a wider range of real-life scenarios. To build such a multimodal and multilingual model, existing works try to ensemble vision-language data from multiple languages in pre-training. However, due to the large number of languages, these works often require huge computing resources and cannot be flexibly extended to new languages. In this work, we propose a MultiLingual Acquisition (MLA) framework that can easily empower a monolingual Vision-Language Pre-training (VLP) model with multilingual capability. Specifically, we design a lightweight language acquisition encoder based on state-of-the-art monolingual VLP models. We further propose a two-stage training strategy to optimize the language acquisition encoder, namely the Native Language Transfer stage and the Language Exposure stage. With much less multilingual training data and computing resources, our model achieves state-of-the-art performance on multilingual image-text and video-text retrieval benchmarks.

acquisition, multi-lingual acquisition, multimodal pre-training, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Neural Information Processing SystemsDec-25-2025, 03:13:32 GMT

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.

cross-modal retrieval, name change, prototype-based aleatoric uncertainty quantification, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Add feedback

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

Neural Information Processing SystemsDec-24-2025, 04:23:07 GMT

Cross-modal retrieval aims to build correspondence between multiple modalities by learning a common representation space. Typically, an image can match multiple texts semantically and vice versa, which significantly increases the difficulty of this task. To address this problem, probabilistic embedding is proposed to quantify these many-to-many relationships. However, existing datasets (e.g., MS-COCO) and metrics (e.g., Recall@K) cannot fully represent these diversity correspondences due to non-exhaustive annotations. Based on this observation, we utilize semantic correlation computed by CIDEr to find the potential correspondences. Then we present an effective metric, named Average Semantic Precision (ASP), which can measure the ranking precision of semantic correlation for retrieval sets. Additionally, we introduce a novel and concise objective, coined Differentiable ASP Approximation (DAA).

differentiable semantic metric approximation, name change, probabilistic embedding, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.80)

Add feedback

Filters

Collaborating Authors

cross-modal retrieval

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Easy Regional Contrastive Learning of Expressive Fashion Representations

An End-To-End Graph Attention Network Hashing for Cross-Modal Retrieval

734abb86d3caa949f44da8a093717f61-Paper-Datasets_and_Benchmarks_Track.pdf

4d893f766ab60e5337659b9e71883af4-Paper-Conference.pdf

2492288f6878e6f99124b362604e58f5-Paper-Conference.pdf

CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

An End-to-End Graph Attention Network Hashing for Cross-Modal Retrieval

Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval